Fixup merge master into Chinese Word Segmentation by CrazySteve0605 · Pull Request #20055 · nvaccess/nvda

CrazySteve0605 · 2026-05-05T00:44:06Z

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

…zation and corresponding tests

…neseWordSegmentationStrategy

cary-rowen

Was this an unexpected change?

cary-rowen · 2026-05-05T02:15:33Z

Hi @CrazySteve0605, I reviewed this PR against the Copilot comments from #19166. Most of the explicit comments are handled correctly, but I think two issues still need attention.

First, the braille offset-converter issue is still unresolved. braille.py still keeps only one converter. When Chinese word-segmentation spacing is applied first and Unicode normalization is also enabled, the segmentation converter is overwritten by the normalization converter. As a result, brailleToRawPos / rawToBraillePos are only mapped back through the last converter, so cursor routing / selection mapping can still be wrong. This probably needs converter composition, or applying the transformations while keeping a combined mapping back to the original raw text.

Second, _initCppJieba now calls cls._lib.initJieba(dictDir), but it does not check the returned boolean. Since the C++ side now returns false on initialization failure, the Python side should probably treat that as failure too, e.g. set cls._lib = None and log/debugWarn. Otherwise NVDA may think cppjieba is available even though it was not initialized successful.

The rest of the fixes look broadly in the right direction to me.

cary-rowen · 2026-05-05T02:19:40Z

I will test the actual experience with Focus 80 later. The above is just a response to some points that Copilot raised that may need to be considered.

…SegmentationStrategy

- use a list of converters for improved processing - add unit test for Chinese word segmentation and Unicode normalization offsets

cary-rowen · 2026-05-05T16:17:38Z

Hi @CrazySteve0605

While testing Chinese braille word segmentation, I found a regression: some NVDA built-in braille state abbreviations are being split by the word segmentation logic. For example, the checked state for a checkbox should remain ⣏⣿⣹, but with this feature enabled it becomes ⣏ ⣿ ⣹, with unexpected spaces inserted between the braille cells.

This seems to happen because WordSegWithSeparatorOffsetConverter is applied to the entire rawText in braille.Region.update(). That text contains not only user-facing content, but also NVDA-generated braille state abbreviations. The current segmentation logic avoids inserting spaces next to punctuation, but Braille Pattern characters are Unicode symbols, so they are not protected by that rule.

I think this should be fixed in this PR. A reasonable approach would be to avoid inserting word-segmentation separators between Braille Pattern characters, or more generally avoid inserting separators across symbol boundaries. It would also be good to add regression tests to ensure ⣏⣿⣹ remains unchanged, while normal Chinese text is still segmented as expected.

Btw, please remember to remove irrelevant .md file changes from the changes.

Thanks

cary-rowen · 2026-05-05T16:24:26Z

Regarding Braille status abbreviations, please refer to this section in the user guide.

…ional Unicode (e.g. braille) categories

cary-rowen · 2026-05-09T00:17:17Z

Thanks @CrazySteve0605

The fix works for me.
It's ready for review?

CrazySteve0605 · 2026-05-09T02:43:58Z

Hi @cary-rowen , thanks for reviews and testing.

CrazySteve0605 and others added 5 commits May 5, 2026 01:14

Add null checks and exception handling in JiebaSingleton methods

2b17c5b

Add configuration option for eager Chinese word segmentation initiali…

34bbf4c

…zation and corresponding tests

Fix logging and return None for unknown word segmentation standard

c537a6d

Refactor word segmentation strategy and improve error handling in Chi…

3f2bb6b

…neseWordSegmentationStrategy

Pre-commit auto-fix

b34a049

cary-rowen reviewed May 5, 2026

View reviewed changes

Comment thread projectDocs/issues/githubIssueTemplateExplanationAndExamples.md

cary-rowen reviewed May 5, 2026

View reviewed changes

Comment thread projectDocs/issues/readme.md

cary-rowen suggested changes May 5, 2026

View reviewed changes

CrazySteve0605 and others added 3 commits May 5, 2026 16:48

Add error handling for cppjieba initialization failure in ChineseWord…

dff373d

…SegmentationStrategy

Refactor braille update method

64290e4

- use a list of converters for improved processing - add unit test for Chinese word segmentation and Unicode normalization offsets

Pre-commit auto-fix

8a5c3db

Enhance punctuation handling in segmentedText method to include addit…

cc5a950

…ional Unicode (e.g. braille) categories

CrazySteve0605 marked this pull request as ready for review May 9, 2026 02:42

CrazySteve0605 requested review from a team as code owners May 9, 2026 02:42

CrazySteve0605 requested review from Qchristensen and seanbudd and removed request for a team May 9, 2026 02:42

seanbudd merged commit fab3060 into nvaccess:try-chineseWordSegmentation-staging May 11, 2026
34 of 37 checks passed

github-actions Bot added this to the 2026.2 milestone May 11, 2026

seanbudd mentioned this pull request May 12, 2026

[WIP] Merge Chinese Word Segmentation work #19166

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixup merge master into Chinese Word Segmentation#20055

Fixup merge master into Chinese Word Segmentation#20055
seanbudd merged 9 commits into
nvaccess:try-chineseWordSegmentation-stagingfrom
CrazySteve0605:fixup-mergeMaster

CrazySteve0605 commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

cary-rowen left a comment

Uh oh!

cary-rowen commented May 5, 2026 •

edited

Loading

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

cary-rowen commented May 5, 2026 •

edited

Loading

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

cary-rowen commented May 9, 2026

Uh oh!

CrazySteve0605 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

CrazySteve0605 commented May 5, 2026

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

Uh oh!

Uh oh!

cary-rowen left a comment

Choose a reason for hiding this comment

Uh oh!

cary-rowen commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

cary-rowen commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cary-rowen commented May 5, 2026

Uh oh!

cary-rowen commented May 9, 2026

Uh oh!

CrazySteve0605 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cary-rowen commented May 5, 2026 •

edited

Loading

cary-rowen commented May 5, 2026 •

edited

Loading